Indexing k-mers in linear space for quality value compression
نویسندگان
چکیده
منابع مشابه
Indexing Arbitrary-Length k-Mers in Sequencing Reads
We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating k-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array), based on finding overlapping reads, is competitive to the existing algorithms in the space...
متن کاملPrivately Matching k-mers
We construct efficient protocols for several tasks related to private matching of k-mers (sets of k length strings). These are based upon the evaluation of functionalities in the levelled homomorphic encryption scheme YASHE which supports addition and multiplication as SIMD operations. We analyse the correctness and security properties of these protocols as well their resource costs in terms of...
متن کاملLinear-Size CDAWG: New Repetition-Aware Indexing and Grammar Compression
In this paper, we propose a novel approach to combine compact directed acyclic word graphs (CDAWGs) and grammar-based compression. This leads us to an efficient self-index, called Linear-size CDAWGs (LCDAWGs), which can be represented with O(ẽT log n) bits of space allowing for O(log n)-time random and O(1)-time sequential accesses to edge labels, and O(m log σ+ occ)-time pattern matching. Here...
متن کاملLinear-time string indexing and analysis in small space
The field of succinct data structures has flourished over the last 16 years. Starting from the compressed suffix array by Grossi and Vitter (STOC 2000) and the FM-index by Ferragina and Manzini (FOCS 2000), a number of generalizations and applications of string indexes based on the Burrows-Wheeler transform (BWT) have been developed, all taking an amount of space that is close to the input size...
متن کاملA Space-Saving Linear-Time Algorithm for Grammar-Based Compression
A space-efficient linear-time approximation algorithm for the grammar-based compression problem, which requests for a given string to find a smallest context-free grammar deriving the string, is presented. The algorithm consumes only O(g∗ log g∗) space and achieves the worstcase approximation ratio O(log g∗ log n), with the size n of an input and the optimum grammar size g∗. Experimental result...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Bioinformatics and Computational Biology
سال: 2019
ISSN: 0219-7200,1757-6334
DOI: 10.1142/s0219720019400110